Parallel Computations for Hierarchical Agglomerative Clustering using CUDA Fast and Scalable Computations on Graphics Processors
نویسندگان
چکیده
Graphics Processing Units (GPU) in today’s desktops can well be thought of as a high performance parallel processor. Traditionally, parallel computing is the usage of multiple computing resources to execute computational problems simultaneously. Such computations are possible using multi-core CPUs or computers with multiple CPUs or by using a network of computers in parallel. Today’s GPUs are capable of simultaneously using multiple internal computing resources such as ‘core-processors’ or ‘multi-processors’ to compute within a fraction of the time a CPU would need. We explore the parallel architecture of GPU for cost-effective desktop parallel computing of a core data mining problem such as clustering, which could then be applied to parallelize other data mining computations. The launch of NVIDIA’s Compute Unified Device Architecture (CUDA) technology has been a catalyst to the phenomenal growth of the application of GPUs to parallelize various scientific and data mining related computations. With CUDA the skills and techniques needed in invoking the internal parallel processors of a GPU is viable to scientific researchers who might not be expert graphics programmers. We embark on the application of CUDA based programming to parallelize the traditional Hierarchical Agglomerative Clustering (HAC) algorithm and demonstrate speed gains over the CPU. Speed gains from 15 times up to about 90 times have been realized for various clustering conditions. The effects of CUDA blocks and challenges involved in invoking graphical hardware for such data mining algorithms are discussed. It is interesting to note that a block size of 8 is optimal for GPU with 128 internal processors. We further discuss the research issues that arise with parallelizing HAC on GPU with CUDA and propose the use of GPU as an efficient desktop processor. Results show that the future of extensively utilizing desktop computers for parallel computing based on GPUs is promising.
منابع مشابه
An Approach for Fast Hierarchical Agglomerative Clustering Using Graphics Processors with CUDA
Graphics Processing Units in today’s desktops can well be thought of as a high performance parallel processor. Each single processor within the GPU is able to execute different tasks independently but concurrently. Such computational capabilities of the GPU are being exploited in the domain of Data mining. Two types of Hierarchical clustering algorithms are realized on GPU using CUDA. Speed gai...
متن کاملHigh Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation
Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...
متن کاملHigh Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation
Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...
متن کاملBenchmarking the NVIDIA 8800GTX with the CUDA Development Platform
Two HPEC Challenge benchmarks, finite impulse response and QR decomposition, were implemented on a NVIDIA 8800 GTX graphics card using a data-parallel implementation approach. For the finite impulse response filter bank benchmark, a fast convolution FFT-based frequency-domain approach on the GPU performed 4 to 35 times faster than the comparable calculation on a CPU. A non-transform time-domain...
متن کاملExploiting parallelism to support scalable hierarchical clustering
A distributed memory parallel version of the group average Hierarchical Agglomerative Clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard TREC test collection, our par...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014